A probabilistic early fault detection model for a feedback machining system with multiple types of spares

This paper studies corrective and preventive maintenance to provide a quality control policy. The corrective maintenance, depending on the time, of a feedback machining system model with a finite source and standbys is presented. Moreover, the system has a known number of servers to repair the damaged units, and it contains an inspector to ensure the maintenance quality of the repaired units. The exact value of the probability of n units in the system will be obtained by using an efficient algorithm that depends on the Laplace transformation. To promote the concept of preventive maintenance, we use this probability to get the probability of early fault detection as a function of time and in the steady state. The applicability of this model is discussed for different system capacities.


List of symbols P n (t)
Probability of n stopped machines (units) in the system at time t P 0 (t) Probability of non stopping units in the system at time t n The number of stopped units, 0 ≤ n ≤ N N The system capacity (all machines in the system) C The servers-technical staff (who replace damaged parts) S Spares (standby machines) q The departure probability of repaired and effective machine to the operating system ε The spares rate P D (t, n, z) Detection probability of another further damage in the future E D (t, n, z) Mean detection time Production quality control is one of the most important methods to increase production.This control participates in identifying the causes of poor production without wasting time resulting from the failure of one of the production lines.This failure leads to a waste in the factory's productivity and then the orders on time.This leads to the customer losing patience and having to withdraw from the service system.This will cause a loss to the factory, and therefore a mechanism must be put in place to replace the stopped machine with a working one to ensure the continuity of production.
The process of replacing damaged spares has remarkable importance in the production process due to its great benefits in decreasing the failure rate of the production lines.It is an important type of finite source queueing system where the stopped servers can be replaced with another working one to ensure the continuity of the system's operation.Shekhar et al. 1-3 and Jain et al. 4,5 presented many classical models with some particular conditions to study the performance measures of the machine-repairing queueing systems.After the repairing process is done for the damaged unit (one of the stopped-working servers) by using the spares, it is replaced by another damaged one.The queueing system M/M/C/N model is similar to the factory, which has a known number of production lines.If we used the mechanism of replacing the spares to avoid wasting time and cost, then we would face the same model as that studied by Shortle et al. 6 .They applied the cold spares approach to the M/M/C/N model and compared this model with the classical M/M/C/N model.Gupta et al. 7,8 and Jain et al. 9 studied the relationships significant models to detect the lost targets, such as El-Hadidy 42 .When taken as a whole, this offers a research technique for developing a new model to solve the preventative maintenance issue.
This paper is organized as follows: "List of symbols" provides a glossary of terms used in this work.The description and the formulation of our model are presented in "Model formulation".Depending on the Poisson process, this section presents two different probabilistic systems that set up the probability functions in a suitable form.In "Corrective maintenance for transient behaviour", we present the transient behavior of this model by applying the Laplace transformation to the probabilistic systems obtained in "Model formulation" to get the corresponding exponential matrix of the coefficients matrix for each system.Then, the exact solution to these systems will be obtained by using an associated algorithm and Maple code.A special case of cold spares has been presented in "Cold spares case"."Fault detection as a preventive maintenance method" used this solution to get the probability of detection and the mean time to detection as a function of time.In the final section, we discuss the conclusion and future work.

Model formulation
Let a feedback machining system model with finite source and standbys be considered an M/M/C/N queueing system with spares (see , where the service is provided to each unit according to the firstcome, first-served (FCFS) discipline.The service and the inter-arrival time of units are independent and identically distributed (iid) exponential random variables with rates µ > 0 and > 0 , respectively, where 0 < µ < .The stopping machines enter the repair crews as a single waiting line, as shown in Fig. 1. Th entries are done one by one according to a Poisson process.
We consider the state-dependent failure rate is given by: where ε is a parameter that characterized the spares type and 0 < ε < .As in Shekhar et al. 1 , if ε = 0 then we deal with a cold spares case.And, if 0 < ε < then we face a warm spares case.Otherwise, the hot spares case when ε = is considered.It is clear that the behavior of the parameter ε depends on the distribution of the inter-arrival time.Hence, the mechanism of repairing (servicing) depends on an identically exponential random variable with state-dependent repair rates: (1) www.nature.com/scientificreports/When a malfunction occurs in one of the machines, it should be replaced by another one, and then the corrective maintenance process for the damaged machines begins.The repaired unit will become a standby for a new damaged unit.In this type of maintenance, a set of operations is provided to repair the machines through a group of servers.Maintenance times vary due to the differences in the repair process from one machine to another.After the failed machine has been repaired, it is passed on to the inspector to check its effectiveness.If the machine still fails, then the inspector will return it to the end of the original waiting line as a feedback unit with a probability of 1 − q for reprocessing it again (see Fig. 1).Otherwise, the not-defective machine will become a standby one to replace with another stopped or failed one on the operating system.In the case of feedback, we need to define the inspection events ω i , i = 1, 2, ..., N as: Here, all repaired machines are examined by the inspector to determine the quality of the repair process.For the defined policy of the inspection process, if there are n jobs in the system and the repaired machine is inspected, then we have ω n = 1 , which determines the machines that must be reconsidered or not to be repaired again; otherwise, ω n = 0 .If the repaired machine is rejected by the inspector, then feedback must be given.This means that the existence of the feedback when the repaired machine failed to pass from the inspector is not random because it depends on the inspector's decision.When the event ω n = 0 , this means the repaired machine has a priority to become a standby one without inspection (this may be a problem in the system where each repaired machine must be inspected before entering the standby list).By considering the above hypotheses with warm and hot spare cases (i.e., ε = 0 ) and applying the Markov conditions, we obtain the following basic systems of probability differential-difference equations, which have been provided in Kotb and El-Ashkar 11 .If C ≤ S , then we have a probabilistic system, And, if C > S then we obtain the following probabilistic system, There is a similarity between our model and the M/M/C/N queueing system with spares, which has been studied by Shortle et al. 6 .Our model is a generalization of the M/M/C/N queueing system after combining S spares with it.When a machine fails in the operating system, it will be entered in the queueing model to be replaced with a spare.Then it will be entered into the inspection system.Slowness in replacing the damaged machines with proper spare parts will reduce the efficiency of the service provided.The repaired and inspected machine will become a new standby one, which will be used again in the operating system.The transient behaviour of our model will be studied in the following section to get P n (t) and P 0 (t) after solving the systems (4)-( 9) and ( 10)- (15).Consequently, the performance measures of this model will be given to show its effectiveness.

Corrective maintenance for transient behaviour
Actual corrective maintenance is a special type of maintenance activity that is undertaken to restore equipment when it has failed to meet an acceptable condition.Moreover, it is basically a correction process that is always adopted after a crash has occurred.Corrective maintenance aims to get machines back up and running as soon ( 3) Vol www.nature.com/scientificreports/as possible to minimize production downtime.These goals are directly related to production capacity and costs, product quality, and consumer satisfaction.It also aims to control the investment required for backup machines.These manufacturing machines need to be replaced until the repairs are completed.As previously stated, there are three cases of spare parts replacement.To get P n (t) , we need to solve the above systems (4)-( 9) and ( 10)- (15).Each one can be rewritten in the following matrix form: where if C ≤ S , then the matrix coefficients are given by, M = (m ij ) ∈ R (N+S+1)×(N+S+1) is a tri-diagonal matrix with entries given by for On the other hand, the system (10)-( 15) has a tri-diagonal matrix with the following entries, The above linear system of homogenous differential equation (16), which have random coefficients, has an analytical solution with a closed form that depends on the exponential matrix (see, Hasselblatt and Katok 50 ), given by where P(0) = [100 • • • 0] T ∈ R (N +S+1)×1 is the initial condition vector.To get exp(Mt), we apply the Laplace transformation definition (see, Chaparro and Akan 51 ) on Eqs. ( 16) and (29).Consequently, we have where s is a complex variable and I is the identity matrix.After that, we apply the inverse Laplace transformation (see, Dyke 52 ) on Eq. ( 30) to get exp(Mt) .This will give the exact value of P n (t), and the calculation of it will be summarized in the following Algorithm 1.
Step 2: Use the command rand(0.0..1.0)to generate the values of the probability q and the com- mand rand(0..1) to generate random integer values of ω n ; 0 or 1.

Step 4:
Use Laplace transform and its inverse to compute the value of (30) and then put the result in a new matrix A.
Step 5: Generate a column vector of the initial condition IC.

Step 6:
Compute the value A × IC which give the exact solution of ( 16).

Cold spares case
Spare parts are one of the most important factors in maintenance, but when they are available in excess, this raises production costs.If spare parts are not available, maintenance procedures are halted, and the system will stop.When S = 0, any one of the above systems will become, Similarly, one can use the above analytical method to get the exact value of P n (t) .Example 1 discusses this solution for different system capacities.

Fault detection as a preventive maintenance method
The main objective of preventive maintenance is to take the necessary precautions and follow the necessary procedures by discovering equipment faults to prevent accidents.Thus, to detect the fault in the repaired unit, we use the exponential detection function 1 − e −z , where z is the searching effort; see Hong et al. 48,49 .The exponential detection function provides an important feature in the process of detecting machines that will be damaged in the future because it exhibits a decreasing rate of return.This slowly increases the probability of detecting a damaged unit and increases further as the amount of searching effort increases.The probability of the unit being nominated for failure depends on the number of repaired and accepted units that have been inspected.The detection probability of the unit nominated for stopping is then given by: Now, we can compute the value of P D (t, n, z) for each spares case and also the mean detection time by, by adding the following two steps to the above Algorithm 1.
Example 1 Consider the operating system contain N = 4 machines and different number of spared units S with rate ε = 2 .The failed units arrive to the service stage with exponential rate = 2.3 .Also, the service time has an exponential distribution with rate µ = 4.72 .Thus, to get the computational value of P n (t) , we use Maple 13 on (29) P(t) = exp(Mt)P(0), Intel(R) Core(TM) i7 CPU with Microprocessor 2.30 GHz and with 16.0 GB.In the case of C ≤ S , where C = 2 and S = 3 the exact vector solutions P(t),t ∈ [0, 1][0, 1] appears in Fig. 2, where the solution of the system (4)-( 9) is obtained from the following coefficient matrices, Also, in the case of C > S, where C = 3 and S = 2 the solution of the system ( 10)-( 15) is obtained from the coefficient matrices to get the following P(t), see Fig.    www.nature.com/scientificreports/In addition, Fig. 4 shows the exact value of P(t) in the cold spare case, where the coefficient matrices given by, If we consider the searching effort which used to do the early detection, is z = 20 , then we apply the Steps 7 and 8 to get P D (t, n, z) and E D (t, n, z) for each replacement cases as in Figs. 5, 6 and 7 respectively.It is clear that, when C ≤ S , the probability of detection will attain its maximum value (see Fig. 5a, b) because there are a greater number of standby machines than the server's.Also, this will do some maximization on E D (t) , as in Figs.5b, 7b.

Steady state case
In the case of C ≤ S , the steady-state probability difference equations of the system (4)-( 9) will becomes: and, for C > S , the system (10)-(15) will becomes:  www.nature.com/scientificreports/Kotb and El-Ashkar 11 used the iterative method to get the probability of n units in the two cases.Thus, if   www.nature.com/scientificreports/Hence, the detection probability of the unit nominated for stopping becomes, Also, the mean detection time becomes, In the cold spars case, the steady-state probability difference equations of the system ( 31)-( 34) will becomes: Using the iterative method, we get where, One can use ( 53) and ( 54) to get P D (n, z) and E D (n, z) , respectively.

Example 2
The main purpose of maintenance is to help the project achieve the goals for which it was established.The primary responsibility is to maximize the percentage of time the machines and equipment are available for operation.Also, maintain the value of the factory by decreasing the rates of machinery wear and deteriorating performance as a result of operation.It is known that good maintenance planning depends on estimating the cost of realistic repairs.Therefore, deviation from the estimated maintenance costs upsets the production costs and causes the estimated budget to be depleted.Therefore, the continuous detection of defects in the machines ensures their accurate work and reduces losses.If we have N = 50 machines and spared units S = 10 with rate ε = 0.4 such that = 0.03 and µ = 0.4,z = 20 then in the case of C = 4, 6, 8 (i.e.,C ≤ S ) we use ( 53) and ( 54) to get the computational values of P D (n, z) and E D (n, z) as in Fig. 8.
Also, in the case of C > S, where C = 8 and S = 5, 6, 7 the values of P D (n, z) and E D (n, z) appear in Fig. 9.
Figure 10 shows the values of P D (n, z) and E D (n, z) in the cold spare case.When the value of spares increases and there is some stability in all other parameters, we notice a stable and noticeable increase in the value of P D , as in Fig. 10a.In contrast to the increase in the number of servers and the stability of the rest of the parameters, a turbulent increase was observed in P D , as shown in Fig. 9a.This is due to the change in the number of units exiting after the inspection process.A disturbance in the detection probability led to a reduction in the time to detect the fault.Of course, this had a significant impact on E D , as shown in Figs.9b, 10b.
waiting in a queue.There are a limited number of servers responsible for the repair process, with an inspector to ensure the quality of the service.At the same time, there are units (spares or standby) ready for replacement in the event of any malfunction.We studied the replacement process for three types of spares, which are warm, hot, and cold spares.We discussed the transient behaviour of this probabilistic model using the Laplace transform.An algorithm for calculating the exact value of the probability n units ( n ≤ N ) over time is presented.On the other hand, preventive maintenance was studied through the early detection process of faulty units in order to avoid a long interruption of production as a result of the unit replacement process.The probability of early detection of the target and the expected value of the detection time were obtained.In addition, the detection probability and the mean time of detection are discussed in the steady-state case.
In future work, this model can be used to study preventative maintenance when the amount of the detection effort is a random variable with a known distribution.Furthermore, the flexibility of this model allows us to investigate more complex problems in real life and contains more conditions that are processed in Poisson equations.

Figure 4 .
Figure 4. P n (t) in the cold spare case where C = 3.

Figure 7 .
Figure 7. (a) P D (t) and (b) E D (t) in the cold spare case where C = S = 2.

Figure 8 .
Figure 8.(a) P D and (b) E D when C ≤ S.

Figure 9 .
Figure 9. (a) P D and (b) E D when C > S.
The repairing mechanism of failure machines.